Sign Stable Random Projections for Large-Scale Learning

نویسنده

  • Ping Li
چکیده

In this paper, we study the use of “sign α-stable random projections” (where 0 < α ≤ 2) for building basic data processing tools in the context of large-scale machine learning applications (e.g., classification, regression, clustering, and near-neighbor search). After the processing by sign stable random projections, the inner products of the processed data approximate various types of nonlinear kernels depending on the value of α. Thus, this approach provides an effective strategy for approximating nonlinear learning algorithms essentially at the cost of linear learning. When α = 2, it is known that the corresponding nonlinear kernel is the arc-cosine kernel. When α = 1, the procedure approximates the arc-cos-χ kernel (under certain condition). When α → 0+, it corresponds to the resemblance kernel, which provides the exciting connection between two popular randomized algorithms: (i) stable random projections (ii) b-bit minwise hashing. No theoretical results are known so far for other α values except for α = 2, 1, or 0+.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sign Stable Projections, Sign Cauchy Projections and Chi-Square Kernels

The method of stable random projections is popular for efficiently computing the lα distances in high dimension (where 0 < α ≤ 2), using small space. Because it adopts nonadaptive linear projections, this method is naturally suitable when the data are collected in a dynamic streaming fashion (i.e., turnstile data streams). In this paper, we propose to use only the signs of the projected data an...

متن کامل

Sign Cauchy Projections and Chi-Square Kernel

The method of stable random projections is useful for efficiently approximating the lα distance (0 < α ≤ 2) in high dimension and it is naturally suitable for data streams. In this paper, we propose to use only the signs of the projected data and we analyze the probability of collision (i.e., when the two signs differ). Interestingly, when α = 1 (i.e., Cauchy random projections), we show that t...

متن کامل

Very Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections

The method of stable random projections [39, 41] is popular for data streaming computations, data mining, and machine learning. For example, in data streaming, stable random projections offer a unified, efficient, and elegant methodology for approximating the lα norm of a single data stream, or the lα distance between a pair of streams, for any 0 < α ≤ 2. [18] and [20] applied stable random pro...

متن کامل

Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved screening tests that quickly identify codewords that are guaranteed to have zero weights. Second,...

متن کامل

Using Stable Random Projections

Abstract Many tasks (e.g., clustering) in machine learning only require the lα distances instead of the original data. For dimension reductions in the lα norm (0 < α ≤ 2), the method of stable random projections can efficiently compute the lα distances in massive datasets (e.g., the Web or massive data streams) in one pass of the data. The estimation task for stable random projections has been ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1504.07235  شماره 

صفحات  -

تاریخ انتشار 2015